FINDING SADDLE POINTS OF MOUNTAIN PASS TYPE 
WITH QUADRATIC MODELS ON AFFINE SPACES 

C. H. JEFFREY PANG 

Abstract. The problem of computing saddle points is important in certain problems in numer- 
ical partial differential equations and computational chemistry, and is often solved numerically by 
^~^ a minimization problem over a set of mountain passes. We propose an algorithm to find saddle 

^~^ points of mountain pass type to find the bottlenecks of optimal mountain passes. The key step is 

^_^ to minimize the distance between level sets by using quadratic models on affine spaces similar to 

Cm the strategy in the conjugate gradient algorithm. We discuss parameter choices, convergence re- 

T—{ suits, and how to augment the algorithm to a path based method. Finally, we perform numerical 

^ experiments to test the convergence of our algorithm. 
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. 1. Introduction 

f^ We begin with the definition of a mountain pass. 

,_^ Definition 1.1. Let X be a topological space, and consider a,b ^ X. For a function / : X — > M, 

r* define an optimal mountain pass p e Y{a,b) to be a minimizer of the problem 

k> inf , sup /o ;,(?). (1) 

^ peT{a.b)Q<f<i 

C^ Here, T{a,b) is the set of continuous paths p: [0, 1] — ^ X such that p{Qi) — a and p{\) — b. 

The point x is a critical point if V/(i:) = 0, and the critical point i is a saddle point if it is not 
a local maximizer or minimizer on X. The value f{x) is a critical value if x is a critical point. We 
say that x is a saddle point of mountain pass type if there is an open set U containing x such that x 
lies in the closure of two path connected components of {x GU : f{x) < f{x)}. In the case where 
/ is smooth and an optimal mountain pass p : [0, 1] -^ X exists, the maximum of/ on p{[0, 1]) is 
a saddle point. 

The problem of finding saddle points numerically is important in the problem of finding weak 
solutions to partial differential equations numerically. Some of the theoretical references include 



Date: July 22,2011. 

Key words and phrases. Mountain pass algorithm, conjugate gradient, convergence, Krylov subspace, indefinite 
matrix. 

1 



FINDING SADDLE POINTS OF MOUNTAIN PASS TYPE WITH QUADRATIC MODELS ON AFFINE SPACES 2 

ifTSl I20I I2TI I22I l25l . See also the more accessible reference fTTl. The original paper of a moun- 
tain pass algorithm to solve partial differential equations is [4], and it contains several semilinear 
elliptic problems. Particular applications in numerical partial differential equations include find- 
ing periodic solutions of a boundary value problem modeling a suspension bridge f6l (introduced 
by 1 121), studying a system of Ginzburg-Landau type equations arising in the thin film model 
of superconductivity fTI, the choreographic al 3-body problem ^, and cylinder buckling IfTOl . 
Other notable works in computing saddle points for solving numerical partial differential equa- 
tions include the use of constrained optimization [9J, extending the mountain pass algorithm to 
find saddle points of higher Morse index [Ml (See also the theoretical foundations in lfT9l '). 
extending the mountain pass algorithm to find nonsmooth saddle points ||26| . and the exploitation 
of symmetry 1 23 , 24 1 . 

The problem of finding saddle points numerically is by now well entrenched in the chemistry 
curriculum. In transition state theory, the problem of finding the least amount of energy to tran- 
sition between two stable states is equivalent to finding an optimal mountain pass between these 
two stable states. The highest point on the optimal mountain pass can then be used to determine 
the reaction kinetics. The foundations of transition state theory was laid by Marcelin, and impor- 
tant work by Eyring and Polanyi in 1931 and by Pelzer and Wigner a year later established the 
importance of saddle points in transition state theory. We cite the Wikipedia entry on transition 
state theory for more on its history and further references. Numerous methods for computing 
saddle points were suggested through the years, and we refer to |8| for a survey. A software for 
computing saddle points in chemistry is Gaussiarjj Tools for computing transition stateqjare also 
included in VASFn Though the entire optimal mountain pass is needed for such an application, 
the process of computing saddle points often gives hints on an optimal mountain pass. 

As mentioned in ifTSl , our initial interest in the problem of computing saddle points of moun- 
tain pass type comes from computing the distance of a matrix A e C"^" to the closest matrix with 
repeated eigenvalues (also known as the Wilkinson distance problem). 

Many of the prevailing methods of finding an optimal mountain pass make use of the formula- 
tion ([T]) directly and discretize a path in r{a,b). See |8 , 17| for example. This discretized path is 
perturbed so that the maximum value of / along the path is reduced. The proof of the celebrated 
mountain pass theorem |[T| (which establishes the existence of saddle points under some added 
conditions) shows that such a strategy allows one to find a saddle point. 

We recall the classical theory of numerical optimization to get some ideas on how to design 
algorithms for the mountain pass problem. Global optimization is provably difficult without 
additional assumptions, so one looks at the local theory. Optimization algorithms are then judged 
based on how well they perform once the iterates get close to the minimizer. The global mountain 
pass problem is also provably difficult, so once again we look at local methods. For a local 
theory of the mountain pass problem, observe that the saddle points of mountain pass type can 
be seen as the bottlenecks at which an optimal mountain pass has to pass through. The process 
of identifying such bottlenecks can then give clues to how an optimal mountain pass can be 
constructed. Algorithms for finding saddle points of mountain pass type can therefore be judge 
based on how well they perform once they get close to the saddle point. 

For a value Z e M, we say that {jc e X | f{x) < Z} is a level set. The idea of using level sets 
to establish lower bounds for critical values and to successively find the closest points in the 
level sets to estimate the saddle point was proposed in 1 16| and revisited in ifTSJI (written without 
knowledge of |16|). Suppose {/,} is an increasing sequence and the sequences of points {x,} 
and {yj} are such that x, and y, lie in different components of the level set {x \ f{x) < /,}. If 
the sequences {x,} and {y,} have a common limit, then this common limit is a critical point. 
Using level sets has several computational advantages. Only two points are needed at any time 



mi/I 
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during the computations instead of a discretized path. Much of the computational effort is then 
performed near the saddle point, which can be seen as a bottleneck that all optimal mountain 
passes must pass. The distance ||x,— y,|| gives an indication of the algorithm progress. Lastly, it 
was proven in |T3l that, provided black boxes for finding closest points to components of the level 
set and for the minimization of the function / on an affine space exist, we have local superlinear 
convergence to the saddle point. Figurefllcontrasts the two strategies for finding saddle points of 
mountain pass type. 





Figure 1 . The diagram on the left shows the classical method of perturbing 
paths for the mountain pass problem, while the diagram on the right shows 
convergence to the critical point by looking at level sets, as was done in 1131 
and this paper. 



Another technique we borrow from optimization theory is to make use of the fact that the func- 
tion / has a quadratic approximation at where it is smooth, and in particular at the critical points. 
The quadratic approximation is the basis on which Newton methods, quasi-Newton methods and 
the conjugate gradient algorithms are derived. All known algorithms achieving fast convergence 
(i.e., quadratic, superlinear, or linear convergence) are one of the above-mentioned algorithms, 
trying to find x such that V/(x) = by solving the associated linear system obtained from the 
quadratic approximation. Any algorithm that can converge fast to the saddle point should be 
similar in some way to the above-mentioned algorithms. 

The analysis in llT3l uses the following approximation of / at the saddle point x: 



f{x)=f{x) + {x-xfH{x){x-x)+o{\\x-x\\^). 



(2) 



To simplify our analysis, we assume that X ~ M" throughout so that / : M" — > M. A common 
assumption in finding saddle points of mountain pass type is that of nondegeneracy. The saddle 
point X is said to be nondegenerate if the Hessian H{x) is nonsingular The more restrictive ^^+ 
condition, equivalent to the Hessian mapping H -.W ^^ W^" being locally Lipschitz at x, can be 
realistically assumed for many practical problems. 

The smoothness of / at a critical point x gives the approximation ([2|, and a similar approxi- 
mation can be written for the gradient V/. For the analysis in this paper, we concentrate on the 
theory of finding saddle points in the case where the Hessian H{-) is constant. This is equivalent 
to assuming that / : M" — > M is defined by f{x) = jX^Hx + g^x + c and ignoring the higher order 
terms, which is analogous to the textbook analysis of the steepest descent and conjugate gradi- 
ent algorithms for optimization in quadratic problems. These assumptions simplify much of the 
analysis and brings out the main ideas of how an iterative method can find the saddle point with 
preferably way fewer than n iterations. Since the textbook analysis of conjugate gradient algo- 
rithms discusses only the exact quadratic case and not the smooth case, we shall only analyze the 
exact quadratic in this paper There are some parallels between the saddle point problem surveyed 
in L3J and the material in this paper 

The following fact about saddle points of mountain pass type will also be used throughout. The 
Morse index of a nondegerate critical point is the number of negative eigenvalues of its Hessian. 
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Next iterates 




Figure 2. Consider / : 



. defined by f{x) = —x\ +X2 +X3, which has 



a critical point at and critical level 0. Let / < 0. In our algorithm, we first 
find the closest points of the two components of {x : fix) < /} on a line. Then 
through information obtained from the gradients and function values, we ap- 
proximate the behavior of / on a larger affine space and find pairs of points 
closer to each other in the respective components. This process continues until 
we found points sufficiently close to each other. 



Fact 1.2. (Morse index one) Suppose f 



is "^ and x is a saddle point of mountain 



pass type. If the Hessian Hix) G M"^" is invertible, then H{x) has Morse index one. That is, the 
Hessian has n—\ positive eigenvalues and one negative eigenvalue. 

The main strategy in this paper to find saddle points of mountain pass type is as follows. 
Assume that / has a quadratic approximation near the saddle point x. Let /, be a lower bound 
on the critical value f{x). Through evaluations of / near x, we can find the behavior of / on 
successively larger affine spaces. Such a strategy is analogous to the conjugate gradient algorithm. 
From these better estimates, we can approximately find the closest pair of points in the respective 
components of the level set {x : f{x) < Ij}. This procedure addresses a difficulty in ifTSJI . and is 
illustrated in Figure|2]and elaborated in Sections|3]and|4]in particular We can increase the level 
Ij till Ij is sufficiently close to the critical level f{x), and thus find the critical point x. 

We outline the sections in this paper. After building the needed background on quadratic 
models in Sectionl2] we propose an algorithm in Sectionl3]to find saddle points of mountain pass 
type using quadratic approximations on affine spaces of the level set. Sections HI and [5] explain 
the choices of ii2 and /, in the algorithms in Section |3] In Section |6J we explain briefly how our 
algorithm can be augmented into a path-based algorithm. We prove some convergence results of 
our methods in Section 171 and show how our algorithms perform for quadratic problems in our 
numerical experiments in Section IS] These numerical experiments give an indication of how the 
algorithm can perform once it gets close to the saddle point. 



L L Notation. We denote the set of symmetric matrices in M"^" by S". The lineality space of an 
affine space A passing through a point z is equal to the subspace A — z. 
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2. Quadratic models 

For / : K" ^ M, the second order Taylor approximation motivates the quadratic model jX^Hx+ 
g^x + c to describe the behavior of / near a critical point x. This section discusses issues related 
to quadratic models. 

We begin with the following elementary result. We say that / : M" ^ M is a quadratic with 
unknown parameters if f{x) = ^x^Hx + g^x + c for unknown parameters H £ S", g £R" and 
c e M. We say that d+l points are in general position if the affine space containing these points 
has dimension d. 

Proposition 2.1. (Determining quadratic models) Suppose / : M" — > M is an exact quadratic with 
unknown parameters. Suppose L £ M"^ has linearly independent columns. To determine H € S , 

g (zM. and c G M such that 

/M := ^v'^Hv + fv + c^f{Lv + x)forallv<EW^, 

we need the value f{vi) and the gradient V f{vi) for d points vi,. . . ,v^ in M. and f{vd+i ), where 
{vi ,... ,Vd, Vd+i } are in general position. 

Proof. The problem is equivalent to determining H, g and c such that 

/(v) ^ ^{v-vifH{v-vi)+f{v-vi)+c. 

The gradient of / is V/(v) ~ H{v — vi)+g. Clearly c — f(vi) — f{Lvi + x) and g = V/(vi ) = 
L^V/(Lv + x). 

With an orthogonal transformation, we can assume that the span of {v2 — vi , V3 — vi , . . . , v^ — 
vi } is equal to the span of the first d—\ elementary vectors. Through 

H{vi - vi) = V/(vO - V/(yi) == L^[V/(Lvi+x) - V/(Lvi +x)] for i = 2,...d, 

we can determine Hi j for 1 < / < c/ — 1 and I < j < d. Through the symmetry of H, we can 
determine all entries of H except for Hd.d- Since the points {vi , . . . , Vd+i } are in general position, 
Vd+i ^ vi must have a nonzero c/-th component. With 

1 . J 

f{vd+\) = ij^{vd+\ - vi)H{vd+\ ~ vi) + g {vd+i~vi)+c, 

we can determine the value of //rf^;. D 

The next result describes the behavior of the level sets of a quadratic near a saddle point. 

Proposition 2.2. (Optimality in quadratic models) Suppose f : M." ^i- R is defined by f{x) = 
jX^Hx + g^x + c, where H has eigenvalues {A,}"^[ arranged in decreasing order, with Xi >Qfor 
1 </<«—! and A„ < 0. 

(1) The critical point off is ^H^ g, and the critical level of f is c — 2g^H^ g. 

(2) For a level I < c— Ig^H^^g, the level set {x' G M" : /(x') < /} consists of two convex 

components. The points x and y defined by —H^gzt y — ~^~r In, where q„ is 

the eigenvector of unit length corresponding to the negative eigenvalue of H, are the 
minimizers of 

minjjx — y| 

s.t. X andy are in different components of{x' G M" : f{x') < I}. 

Proof. Part (1) follows by noticing that V/(x) = Hx + g and writing f(x) as 

f{x)^^-{x + H-'g)^H{x + H-'g)+c-^-g^H-'g. 
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For part (2), we first show that {x e M" : f{x) < 1} is the union of two convex components. 
Write H = QDQ^ so that D is diagonal and Q is orthogonal, and write h{u) ~ jU^Du. We have 
h{u) = f{Q^u — H^^g)—c+ jg^H^^g. To simplify notation, let 

1 



Consider the set 



Now, 



l:=l-[c~-g^H-'gl 



:= {ueR" ■.h{u)<l,u„>Q}. 
h{u) < I 



2' 



1 " 

n-1 

<^=> J^XjUi < 21 — X„u^. 
(=1 

For given values mi , . . . , m„_ i , provided that u„ > 0, we have 



u„ > 



-21 + -£'!-; Xju 
—A,, 



defined by ^(v) — \ _^' ' ' . The set S+ is the epigraph 



Consider the function g : M""' -^ R defined by ^(v) = W "'^■^|^_' ^" 

of g, so S+ is a convex set if and only if § is a convex function. We proceed to show that g is 

ife(v)+^W]>§(^( 

1 



convex, that is ^[^(v) +g(w)] > g{^{v + w)) for all v,w e M" '. We have 



k(v')+^(w)] > 



(v' + w) 



-2i+r;il^.vj 

~A„ 






> 






n—\ n—\ 

-21+ Y, A,vf - 2l+ Y, ^wj 



!=1 



(=1 



Let v,w e R" be such that 



n-l 

-2/"+^A,y2^ 



Vi = 

v„ = 

Wi = 

and Wn — 



n-l 

-2l+ Y h^v] > 
-2l+ Y ^iwj > 

i=l 



n-l 

!=1 

n-l 

-21 + V A,V,W; 

(=1 



Vi + Wi 



(3) 



A,v; for 1 </<«—!, 

2/', 
A,w,- for 1 </<«—!, 

2/". 



The Cauchy-Schwarz inequality gives ||v||2||w||2 > (v, vv), which is exactly Q, establishing the 
convexity of ^ as needed. Similarly, S- defined by {u £ R" : h{u) < I, u„ < 0} is also convex, and 
so {uGR" : h{u) < I}, being the union of 5+ and 5_, is the union of two convex sets. The closest 

points between the sets S+ and 5_ are ±i /|^e„, where e„ is the n-th elementary vector. 
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The sets {m € M" : h{u) < 1} and {u e R" : f{Q^u-H-^g)-c+y^H-^g < 1} are identical, 
and is related to {x E M." : fix) < 1} by an orthogonal transformation and a translation. This gives 
the formula of x and y as needed. D 

3. Algorithm for finding saddle points 

In this section, we present an optimality condition for the closest points of level sets, followed 
by our main algorithm to find saddle points of mountain pass type. Here is our first observation 
of level sets. 

Proposition 3.1. (Closest points to level sets of functions) Suppose that two sets A,B in M." are 
defined by A = {x : fix) <i} and B — {x : gix) < 1} for I € M and '£ functions / : K" — ^ M and 
g : M" — > M. Assume that A and B are convex, and that A n B = 0. The points a E A and b E B are 
minimizers of 

min||fl — ft||2 

s.t. a e A andb G B, (4) 

if and only if 

/(fl) = /, gib) ^ I, and ( u^rf-.u , uT -II ) = ( 11^ fT\ii ' ii- All / = ^- ^^^ 

\llv/(«)ll \\b-a\\/ \\\^g{b)\\ \\a-b\\/ 

Proof. For the forward direction, fix a and consider 

min||a — fe||2 
s.t. beB. 

Since B is convex, it is well known that the minimizer to the problem is the projection of a to the 

set B and is unique. Furthermore, ( ,, J'[|r ,, , ,,"~j-,, ) = 1 ■ The equation for the other inner product 

comes by fixing b and varying a. 

For the backward direction, suppose that (|5]l holds. So ||fl — /7II2 is an upper bound on Q. The 
halfspace A' := {x | V/(fl)-^(x — a) < 0} contains A, and similarly forB' = {x \ 'Vg(b)''' {x~b) < 
0}. The distance between A' and B' can be easily checked to be jja — I7II2. This means that ||a — I7II2 
equals the value of Q, which gives us what we need. D 

We now propose an algorithm for finding saddle points of mountain pass type, concentrating 
on the case where / is an exact quadratic to simplify our analysis. 

Algorithm 3.2. (Estimating critical level) Suppose / : M" — >■ M /s an exact quadratic with un- 
known parameters that has a Hessian with n—\ positive eigenvalues and one negative eigenvalue. 
This algorithm approximates x such that V/(i) = 0. 

(1) Fix i — 1 and £ > 0. 

(2) Let U := max{/(xo),/(3'o)}- 

(3) Run Algorithm \3.3\ to find 2 points x,- and yi satisfying (|7]i for x ~ Xi and y — yi. 

(4) If one of the convergence criteria holds: 

(a) \\xi —yi\\ is sufficiently small, or 

(b) J [V/(x,) + V/(y,)], which equals V/( j (x,- +3',)), is sufficiently small in norm, 
then return I [xj +3',]. Otherwise, let /,-+i > Z,- be a lower approximate of the critical value, 
increase the value ofi and return to step 3. 



Algorithm 3.2 uses the following algorithm to find iterates in step 2. 



Algorithm 3.3. (Estimating closest points in level sets) Consider / : M" — >■ M in Algorithm \3.2\ 
Our terminating condition is motivated by the optimality condition (|5]). For inputs li < f{x) and 
x,y gW such that f{x) — f{y) = li and a parameter < £ <C 1, this algorithm returns two points 
x.,y £W such that f{x) = f{y) = li and (|7]l holds. 
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(1) Fix j — \, xi — X andy\ = y. 

(2) Let di — ||'-''_-% ■ Choose a second direction d2from Xj, yj, Vf(xj) and ^ f{yj)- 

(3) Let Aj be the affine space passing through Xj with lineality space the span ofd\ and di- 



Use Proposition 2.1 and further evaluations of f on Aj to determine the quadratic model 
of f on Aj (which will be exact since f is assumed to be quadratic). Consider 

min ||jc — yl 

s.t. x(^S{xj),ycS{yj), (6) 

where S{z) is the component of the level set {u G Aj : f{u) < /,} that contains the point 
Z. Use Proposition 2.2 to find the minimize rs to (|6]l, and let them be the new iterates fj+i 



andyj+i. Clearly, we have f{xj+i) = f{yj+i) — h. 
(4) Increase j, and go back to step 2 unless one of the convergence criteria holds: 
(a) For X — Xj and y = yj, we have 



V/(i)||'||y-i||/-^ ' \l|v/(y)||'||i-y 

(b) i [V/(xy) + V/(yj)] has sufficiently small norm. 
Note that in the algorithms above, there are still choices to be made on the value /,+ i in step 



4 of Algorithm 3.2 and the direction d2 in Step 2 of Algorithm 3.3 which we will discuss later. 



In Subsection 4.2 we shall see that the conjugate gradient algorithm is similar to Algorithms 3.2 



andl3.3lcombined. 



Remark 3.4. (Using largest affine space possible) In Algorithm 3.3 the affine space A, is only of 



dimension 2. A straightforward extension of Algorithm 3.3 is to amend step (3) in as follows: 
(3(H)) In step (3), let Aj be the affine space passing through all previously evaluated points and 

Xj + d2 instead, and proceed as in the rest of step (3). 
This approach also corresponds to finding the quadratic approximation on the largest possible 
affine space with the data at hand. We refer to this the algorithm with this modification as Algo- 
rithm|3]2]:H) and Algorithm[33|:H). 

For readers who wish to apply the methods in this paper for finding saddle points of Morse 
index higher than 1, the algorithms here can be extended in the spirit of 1 18|. 

4. Choice of second direction d2 in Algorithm [33] 



As remarked after Algorithm 3.3 the choice of d2 has to be made in step (3) there. In this 



section, we present and explain the different choices oft/2 summarized in Table l2] 

4.1. Choosing two additional directions d2 and d^ instead of one additional direction d2. 

The strategy in (3D) is to obtain the directions d2 and dj such that both V/(xy) and 'Vf{yj) lie in 



the span of {di,d2,dj}. We shall see in Remark 4.4 that this strategy is the best among the five 
strategies presented. 

4.2. Straightforward generalization of the conjugate gradient algorithm. We recall that the 
conjugate gradient algorithm, which is now considered classical in optimization, can be stated as 
follows. 

Algorithm 4.1. (Conjugate gradient algorithm) Suppose / : M" — > M /i defined by f{x) — Ix^Hx + 
g^x + c, where H eM."^" is positive definite. Then the conjugate gradient algorithm with starting 
iterate zq can be expressed as follows: 

(1) Start with iterate zo G M", and let i — 0. 

(2) Evaluate V f{zi), and let A,- be the affine space through zq with lineality space spanned 
by {y f{zo), ■ ■ ■ , V/(z,)}. Let zi+i be the point on A^ such that ^f{zi+i) is orthogonal to 
all elements in {V/(zo), . • • , ^/(z;)}- 



FINDING SADDLE POINTS OF MOUNTAIN PASS TYPE WITH QUADRATIC MODELS ON AFFINE SPACES 



(3D) 



Choice of d2 in Algorithm 3.3 



(Three directions) Instead of choosing one direction d2, choose two 
directions c/2 and dj, such that both 'Vf(xj) and 'Vf{yj) lie in the span of 
{di,d2,d3}. The affine space in in Algorithm 3.3 is 3-dimensional instead of 



2-dimensional. See Subsection 4. 1 



(Midpoint gradient) Let ii2 = 5 [V/(i/) 



(MG) 



of yfi\[xj +yj])- See Subsection 4.2 



- ^fiyj)], which is an approximate 



(MV) 



(Maximum violation of (|7])) Choose dj by 

-I'M) HM< 

[V/(yy) otherwise. 



di 



< 



V/(>V) 
|v/(yj)|| 



''j-yjw 



See Subsections 4.3 and 4.4 



(Power maximization) Choose t/2 so that ||vv|p + ||v,,||'^ is maximal, where v^^ 
is the projection of V/(ij) onto the space spanned by d] and t/2, and 
similarly for Vy. See Subsection 



(PM) 
(MD) 



(H) 



4.3 



(Midpoint distance) Find the minimizer z of 

{z-Xj,Vf{xj)) 



mm 



^j+yj 



— z 



and let d2~z — 



"i+yj 



If 



V/(i,) 



v/Cyy) 



2 •" l|V/(i,)|| ^||V/{.Vy,„ 

instead to avoid numerical difficulties. See Subsection 



gets too close to 0, we use (MV) 



3.4 



4.4 



we suggested using the 



(Using largest affine space possible) In Remark ; 
largest affine space possible that can be created from previous evaluations of 
/(•) and the gradients V/(). The Hessian H of the model / in Assumption 
4.2 grows in size as the algorithm progresses. 



Table 2. List of strategies to find second direction ^2 in step 2 of Algorithm 3.3 



(3) Increase i by 1 and go to step 2 till |1 V/(z,)|| is sufficiently small. 

In step 2, the point Zi+i also minimizes / on the affine space A,. The gradient V/(z,_|_i ) is also 
orthogonal to the lineality space of A,-. 

We now explain the strategy (MG). Let the midpoint of the iterates Xj and yj in Algorithm 3.3 
be Zj. We see that ^f{zj) is orthogonal the lineality space of the affine space Aj. The midpoint 
Zj p lays the role of zt in the conjugate gradient algorithm above. If the direction d2 in Algorithm 
3.3 H) was chosen to be 'Vf{zj), then Algorithm 3.3 H)(MG) is identical to the conjugate gradient 
algorithm stated in Algorithm |4. 1 [ but with the condition that H be positive definite dropped. The 
midpoint Zj can also be calculated without computing Xj and yj. 

This strategy is also appealing for a few reasons. In Algorithm 4.1 finding the point Zi+i in 



Step 2 using Proposition |Z2] requires the solution of a linear system with a symmetric (though 
not necessarily positive definite) matrix but not an eigen-decomposition like in Algorithm |3.3[ 



Algorithm 4. 1 also does not require the knowledge of the Morse index of the critical point. 



4.3. Preserving the optimality condition (|7]l. We now state an assumption that will be used 
later. 



3.3 



letLje] 



nx2 



Assumption 4.2. Suppose f{x) = ix Hx+g x + c. At iteration j of Algorithm 

be such that the columns ofLj are orthogonal and span the directions d\ and t/2 in Algorithm 3.3 

with the first column ofLj being di. Let Hj € S , gj € M and Cj G 



. be defined by 



Hj-LjHLj, 



gj=L][Hxj+g] 



1 



and Ci = -X ,7/x ; 

J 2 ■> ^ 



fxj + c. 



(8) 



so that fj ; 
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. defined by fj (v) — f{LjV + Xj) equals 4 v i/y v + f ,■ v + Cj 



Other facts immediate from Assumption |4.2| are that the mapping v i— )■ LjV + Xj is a bijection 
between M" and the affine space Aj passing through Xj with Hneality space spanned by di and d2. 
We now explain the strategies (MV) and (PM), and first note the following easy result. 

Fact 4.3. (Preservation of constraint violation) Suppose Assumption \4.2\ holds. Let ei^ be the kth 

elementary vector in M". We have 



VfjiO) - 1 = L]Hxj = LjV/(x-), 



and 



Vfli\\yj~xj\\e,) 



Hi\\yj-Xj\\ei+gj 

L]{HLj\\yj-Xj\\ei+Hxj) 

L]Hyj. 



If d2 is chosen to be Vf[xj), then || V/y(0)|| == ||LyV/(xy)|| — || V/(ij)||, which gives 



I V/;(0) 

\||V/;(0)|| 



J|v/(i,-)ll 

J|V/(i;)ll 

' V/(i;) 



\||V/(i,)inb~;--^.ll/ 
Similarly, if d2 is chosen to be ^ f{yj), then V/^dlyy — xy||ei) — || V/(yy)||, which gives 

I V/,(||y,--i;||ei) \ _ / yf{yj) x,-y, 



|v/,(l!yi-i;||ei)| 



|v/(y,)ll 



^y,\ 



(9) 



(10) 



In view of the optimality conditions in (|5]), one choice of d2 is marked as (MV) in Table l2] 
In other words, the maximum violation of the optimality conditions is preserved in the model 
fj : K^ — ^ K. It is clear that (3D) preserves the violation in both optimality conditions. 

I that if d2 were chosen to be V/(iy), then the projection of V/(iy) onto 



Notice also in Fact 



4.3 



the subspace spanned by d\ and d2 is exactly V/(xj) itself. A similar thing happens if d2 were 
chosen to be '^ f(yj\ This motivates another strategy (PM) in Tablel2] 

4.4. Using affine spaces to approximate the level set. We now explain strategy (MD), and give 
a second explanation for (MV). As illustrated in Figurel3] the region {m e M" : /(m) == /,} neariy 
and yj can be approximated by 



{mGM": V/(ijy (M-ij)=0} 
and {m e M" : '^ fl^j)^ (u ~yj)^ 0} respectively. 



(11) 



Provided that V/(ij) and '^ fiy j) are not multiples of each other, the two affine spaces intersect 
in an affine space of dimension n — 2. Let the projection of \(xj +}';■) onto this affine space be 
z. We shall choose c/2 so that the affine space through Xj with lineality space spanned by di and 
d2 passes through z. Such a strategy is sensible because if perturbing Xj and yj with z — Xj and 
z — yj as tangent directions respectively can give good decrease, though not necessarily optimal 
decrease. To calculate d2, we observe that the normals of the intersection of the affine spaces in 
( [TT| are linear combinations of V/(ij) and Vfijj). We can then write d2 = aV/(iy) + j3 V/(j;y), 
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Figure 3. We illustrate the method (MD) of finding the direction d2- The 
boundaries of the level set {x : f{x) < 1} is approximated by affine spaces in 

' ' ' and z is the 



( [TT| ), and the direction d2 equals the vector z — m, where m ~ -^ 
closest point from m to the intersection of the affine spaces. 



where a and j3 are determined by 

V/(i,)^ 
andV/(y,)^ 



{xj+yj)+d2-Xj 
{xj+yj)+d2~yi 



vf{xjYvf{xj) y.mYyf{yj) 



v/(y,)" 



■■^•^^fi,- 



-yj\ 



If 



V/(.v 



+ 



l|v/{y;) 



gets too close to 0, the value of || ^(i; +yj) — z|| is likely to be much 



l|V/(.v 

larger than \\xj —yj\\, and is likely to cause numerical difficulties. For this reason, the method 
(MD) needs to switch to some other method if |1 5 {xj+yj) — f|| is greater than a fixed multiple of 

P)~3';il- 

More sophisticated estimates of the closest points can be devised, but we shall see in Section 
[8]that (MD) performs well compared to the other algorithms. 

Lastly, we consider the case of fixing yj and perturbing Xj. The region {m e M" : /(m) = Z,} 
near Xj can be approximated by {m e M" : V f{xjY {u — Xj) — 0}. With elementary geometry, 
we can show that the best tangent direction to perturb Xj in is d\ — ,, .1 .,,2 \y f {x jY di\V f {x j) . 

Choosing d2 to be V/(iy) allows one to perturb Xj in that direction. This strategy gives the same 
choice of d2 as (MV) earlier. The case of fixing Xj and perturbing yj is similar 

Remark 4.4. (Directions in (3D)) The directions d2 in the methods (MV), (PM), (MD) and (MG) 
in Tablel2]are linear combinations of V/(i,) and ^f{yj)- Hence the method (3D) will always be 
better than the other methods. 



5. More on /,- in Algorithm|3.2[ and methods (MG) and (3D) 



In the case where / : M" ^ M is not necessarily a quadratic, the level /, in Algorithm |3.2| clearly 
needs to converge to a critical value so that the iterates converge to the corresponding critical 
point. Once near the critical point, the quadratic approximation gets better, and Algorithms |3.2 



and 3.3 become more effective. In this section, we clarify that when / is a quadratic function, the 



role of Z/ is, on the contrary, not as important. 
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We look at the method (MG) in Table [2] As remarked in Subsection 4.2 the strategy (MG) 



does not use the level set structure for its computations. The next result explains that the choice 
of li is irrelevant for both (MG) and (3D). 

Proposition 5.1. (Irrelevance of U in (MG) and (3D)) If f : M" — > M is a quadrati c f( x) = 
jX^Hx + g^x + c, then the two dimensional space spanned by d\ and c/2 in Algorithm 3.3 MG) 

does 



does not depend on Ij. Similarly, the three dimensional subspace spanned by di, dj and dj, 

not depend on U in Algorithm \3.3\ 3D). Similar conclusions hold for Algorithm 3.3 H)(MG) and 

Algorithm\3J\H)(3D). 



Proof. Let Xj and yj be iterates in Algorithm 3.3 The direction di is common to both strategies 



(MG) and (3D). By looking at the quadratic models on the affine space and using Proposition 2.2 
we see that the direction of yj~Xj is independent of Ij. 

We first look at the case of strategy (MG). The corresponding t/2 equals ^fij{xj+yj)), which 
in turn does not depend on Ij. For strategy (3D), the directions d2 and dj, can be chosen to 
be d2 = V/( j(iy +yj)), and d^ = Vf{xj) — d2. Given that / is quadratic, the gradient map 
V/ : M" -^ E" can be written as 'Vf{x) = Hx + g, and is affine. We have 

v/(x-) = mj+g 

= H(^-[xj-yj]^+d2. 

Since the direction of Xj —yj does not depend on /,, the direction of 1^3 = II{^[xi —yj\) does not 
depend on Ij too. The analysis for (H)(MG) and (H)(3D) is similar D 



The above result shows that in the case when / is quadratic, we can rewrite Algorithm |3. 2 
(MG) and (3D) as the following equivalent algorithm without making use of the variable /,. 



Algorithm 5.2. (Equivalent algorithms for (MG) and (3D)) Given a quadratic function / : M" — > 
M where f{x) = jX^IIx + g^x + c, and the critical point x — —H^g has Morse index one, 

(1) Let i ~ 0. Start with approximate critical point zo, and let Aq be some affine space 
containing zq. 



(2) Use Proposition 2.1 and further evaluations of f on At to determine the quadratic model 
of f on Ai (which will be exact since f is assumed to be quadratic). From the quadratic 
model, find the point Zi+i in A, where V/(z,+i ) is orthogonal to the lineality space ofAi. 

(3) Let the lineality space of At be spanned by the columns of the matrix Li, where Li has or- 
thogonal columns. To find the next affine space A,'+i, we need to figure out the directions 
d\, d2 and dT, as follows: 

(a) d\ is the eigenvector corresponding to the negative eigenvector ofH = LJ HLi. 

(b) d2 equals V f{zi+i). 

(c) dj, equals ^ f{zi+ 1 +Xd\), where X is any nonzero scalar. The vector c/3 needs to be 
calculated for (3D), but not for (MG). 

The affine space A,_|_i for the different strategies all pass through z,+i, but have different 

lineality spaces: 
(i) For (MG), the lineality space o/A,+i is spanned by \d\ , ^2}- 
(ii) For (3D), the lineality space o/A,+i is spanned by \d\^d2.,d-{\. 

(iii) For (H)(MG), the lineality space of Ai+i is spanned by the columns of Li and ^2- 
(Note that d\ lies in the column space of Li.) 

(iv) For (H)(3D), the lineality space ofAi^\ is spanned by the columns of Li, c/2 <^nd dj,. 

From this information we can deduce L,+i. Increase the counter i by one and return to 

step 2 until the convergence criteria ofV f{zi) being small in norm is met. 
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As can be seen from Algorithm 5.2 methods (H)(MG) and (H)(3D) are equivalent to a Krylov 
subspace method. In (H)(3D), the Krylov subspace grows by two dimensions instead of one in 
each iteration. 

It is clear that while the other strategies in Tablel2]do not enjoy the property in Proposition |5.1[ 
they can be more effective when / is not a quadratic function. We have the following heuristic on 
the choice of Z,+i. 



Remark 5.3. (Choice of //+i) To choose /,+i from Z, in Algorithm 3.2 one possible strategy is to 
make use of Proposition 2.2 Given /(x) = jX^Hx+g^x + c, the critical level is c— jg^H^^g, the 



distance between the components is 2w ^[2Z— {2c — g^H ^g)], where the negative eigenvalue 

X„ of H is approximated in step 3 of Algorithm |3.3| The critical level can be estimated from A„ 
and the distance between the components. 

For Xj and yj to be well defined, l, needs to be a lower bound of the critical level. If /,■ is found 
to be larger than the critical value, then /; can be reduced so that it is below the critical value. 
Contrast the management of /, to that in the main algorithm in 1131 . where a sequence of lower 
bounds {/,} of the critical value is obtained through an optimization procedure. The {/,} there 
converges superlinearly to the critical value. 

6. Augmenting ideas to a path based algorithm 

A commonly used method of finding an optimal mountain pass is still to discretize and perturb 
a path between two endpoints so that the maximum along the path decreases. We now remark on 
how the idea of finding a quadratic expression near a critical point can be augmented to a path 
based algorithm. 

A basic path-based algorithm can be described as follows. 

Algorithm 6.1. (Basic path based algorithm) Given / : X — >■ M and two points a,b ^ X, find an 
optimal mountain pass p : [0, 1] — > X connecting a and b. 

(1) Consider a discretized path pi,p2, ■ ■ ■ ,Ph where p\ = a and pk — b. 

(2) Find the maximizer off on the line segments [pi , P2] , [p2 , P3] , • • • , [pk- 1 : Pk], say p. 

(3) If ||V/(/?)|| sufficiently close to 0, then algorithm ends. Otherwise, perturb the path 
Pi,P2t ■ ■ ,Pk based on the gradient ^ f{p) and other information. The path may also be 
refined (i.e., more points can be used to describe the path) as necessary. Return to step 
1. 

The ideas on finding a quadratic approximation near the saddle point x can be incorporated 
into the basic path based algorithm. The point p is the point most likely to be closest to the saddle 
point, and the evaluations of / near p can be used to deduce the quadratic approximation of / 
nearx. The quad rati c app roximation of/ on an affine space through p can be constructed through 



Propositions |2 . 1 1 and 2.2 The critical point is estimated to be /? — H(p)^^V f{p) like in a Newton 
method, but because the full information of //(p)^' may not be easily available, ideas from the 
algorithm we proposed can give a good indication of how to perturb the path pi,p2,---,Pk to 
reduce the maximum value of / along the path. 

7. Convergence analysis 

In this section, we prove in Theorem |7.1| a formula describing the rate of convergence of 
algorithm (MG), and prove in Theorem |7.2| that Algorithm [33] will eventually find two points so 
that ^ is satisfied. We begin with the analysis of Algorithm |5.2[ H)(MG). For a positive definite 
matrix A, let the norm || • \\a be defined by ||v||a := Vv^Av. 

Theorem 7.1. (Rate of convergence of (MG)) Suppose that H £ S^" has n — 1 positive eigen- 
values and one negative eige nval ue, all of which lie in [A, A] U {—A}, where < A < A and 



A > 0. Consider Algorithm 5.2 H)(MG) applied to finding the saddle point z = H g for 
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fix) — Ix^Hx + g^x + c. For the ith iterate zu let the error e,- be zt — Z- Then for any posi- 
tive definite matrix A, the errors e; satisfy 



IcilU <2^1±A 



l^olU 



^+1 



i-1 



where K = X /A. 

Proof. Let n, be the set of polynomials p such that p{Qi) ~ 1. Then by using standard methods 
in the study of the convergence of the conjugate gradient algorithm (see for example 13] Section 
9]), we have 



l^olU 



< min max p{X) 

''^"'AG([i,I]U{-i}) 

< mm max_;?(A) — ^ — 

P^^i-i Xe\X,X] X 

<r ^ + ^ • (r\ 

< — ^ — mm max_ p (A ) 

X P^'^i-iXelXA] 



< 2 



X+X 



^+1 



i-l 



n 



Theorem 7.2. (Convergence of iterates of Algorithm 3.3 MV)) Suppose Assumption 4.2 holds at 



iteration j. Then for any £ > 0, Algorithm 3.3 MV) produces some iterate Xj* and yp satisfying 
^forx = Xj* andy = yj*. 

Proof. From Assumption |4.2| we infer that the second column of L is the unit vector in the 



direction of d2 — n . .^ {d^ d2)d\. Consider the model fj{v) — jv HjV + g ^ v + Cj, where Hj, gj 
and Cj are as chosen in (|8]l based on the iterates Xj and yj. Seeking a contradiction, suppose that 
(|7]) is violated for all iterates. 



Recall that at iteration j, if d2 was chosen to be V/(iy), then from Fact 4.3 

/Jim 



\||V/,(0)||' 



1 ; = 



< 



v/fe) 

V/(i;)l 

1-e. 



\\yj- 



A similar inequality can be obtained if di was chosen to be '^ fiyj) instead. This means that 
the pair (0, \\yj — i,||ei) are not the points in the model that minimize the distance between the 

chooses iterates i,+i andy^+i such 



3.3 



components of {m e M : //(«) < U}- Step 3 in Algorithm : 
that \\xj+i-yj+i\\ < \\xj-yj\\. 

From the formulas of the columns of Lj in terms of di and d2, we see that Lj depends contin- 
uously on di and d2- We shall first assume that 



/ V/(i 



yj- 



\ liv/(x,- 



^ 



v/(y,) 



-yj 



(12) 



\\yj~xj\\/ ■ wl"^ fiyj)\\'\\xj-yj\ 

Under this assumption, di and d2 are continuous on the iterates {xj,yj), so the parameters Hj, 
gj and Cj in (|8]l depend continuously on {xj,yj) as well. This implies that the eigenvalues and 
eigenvectors of Hj also depend continuously, and thus the next iterates also depend continuously 
on that of the previous iterates through Proposition |2.2| 

Recall our earlier assumption that Algorithm |3 . 3 1 does not produce the iterates satisfying (|7]i. 
By compactness arguments in M", there is a subsequence of the iterates {{xj,yj)} not satisfying 
(JTJ) converging to some {x',y'). Since f{xj) = f{yj) ~ U for all /, it follows that f{x') = /(/ ) = /,-, 
and that (x',/) do not satisfy (|7| for x — x' and y — y' . If a step of Algorithm 3.3 were to be 
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Objective 



(1) 



Minimizing \\xj —yj\\, the distance between points in different 
components of {u \ f{u) < Ij} 



v/(^j) yj- 



v/(y/)ll'll^,-y, 



(2) 



Maximizing min 



|V/(i,)||'|b>-E;ll 



-yj 



optimality condition for (1). (See Proposition 3.1 ) 



an 



Minimizing V/(j(xj +yj)) , the gradient at the midpoint. 



Minimizing ||z — ^{xj +yj)\\, the distance between the critical point and 
the midpoint. 



(4) 



Table 3. Objective functions used in our numerical experiments and the as- 
sociated greedy algorithms. 



applied to {{x',y')}, then we get new iterates {x",y") such that ||x"~y'|| < ||y — y||. If the 
assumption in ( [T2] l were dropped, then if the iterates {xj,yj) approach {x',y'), then next iterates 
{xj+i,yj^i) approach two possible limits, say {x,y) and (i,y), and both p — yH < \\x' —y'\\ and 
\\x^y\\ <||x'-y||. 

The assumption that a subsequence of {{xj,yj)} converges to {x' ,y') implies that the distance 
between iterates will not go below p' — y||, while the continuity of new iterates from the old 
iterates implies that there are sequences of iterates whose distances is arbitrarily close to Hjc" — 
y'll . With minor adjustments, we can show that a similar condition holds for the case where ( [T2] l 
fails. This is a contradiction, and gives us the conclusion we seek. D 

This theorem also tells us that Algorithm [33];H)(MV), Algorithm [33];3D) and Algorithm 
[33];H)(3D) converge. 

While the convergence analysis pales in comparison to analogous results in optimization, it 

highlights that the second direction do should be chosen so that ( „„J ,J„ , ei ) and ( u ' ' n , 

\||V/;(0)||' V \||v/,(||y-x|kl)|r 

should be as far away from 1 as possible to obtain decrease in the distance between components 
in the next iterate. 

Note that if / were assumed to be such that the Hessian is locally Lipschitz instead, the state- 
ment in Theorem ]? .2 | need not hold for all £ > because the errors in estimating a quadratic model 
may lead to an inaccurate estimate of the points minimizing the distance between the components 
of the level sets in the affine space. 

8. Numerical experiments 

We now describe our numerical experiments in Matlab to test our algorithm|j Specifically, we 
shall test Algorithm 3.3 for the various choices of J2 in Tablel2] 



8.1. Objectives and greedy algorithms. We shall only be concerned with running Algorithm 
|3.3| ^H) for a particular value /,-. While the clear objective in Algorithm |3.3{ H) is to find the two 
closest points of the components of {u \ f{u) < It}, we also use other objectives listed in Table [3] 
in our numerical experiments. Objectives (1) to (3) can be calculated as the algorithm progresses, 
but objective (4) is what one really wants to compute. In ill-conditioned problems, objective (2) 
may be close to the optimal value of 1, but far from achieving the minimum distance in objective 

(1). 

Other than the choice of directions in Tablel2] we introduce greedy algorithms to study whether 
the choice of direction dj in Algorith m|3.3| In what follows, the strategy (Gl) will mean that at 
each step of the iteration. Algorithm |3.3| tries out all directions d2 in Table |2] then chooses the 
direction that best minimizes objective (1) in Table l3] The strategies (G2), (G3) and (G4) are 
similar. Strategy (G4) is an "invisible hand" that brings the iterates as close to the true saddle 



The Matlab codes are available in http://math.mit.edu/'--'chj2pang/mtn_code.tar.gz 
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point as possible. It is not practical because the knowledge of the true saddle would mean that 
there is no need to run the algorithm. 

In typical applications of the problem of finding a saddle point of mountain pass type, the cost 
of evaluating the function and its gradient is high, and may take hours or longer. Compared to 
Algorithm |3.3| Algorithm |33jH) (see Remark [34] ) takes advantage of the quadratic formulation 
to obtain fast convergence to the saddle point. Therefore, we shall only perform experiments to 
study Algorithm |3.3| ^H). Our experiments would give an indication of how fast the convergence 
to the critical point would be once the quadratic approximation is reliable. 

8.2. Numerical experiments. If / : M" — > M is a quadratic, then we can, with an orthogonal 
transformation and translation, assume that f{x) = jX^Dx, where D e M"^" is diagonal. The 
critical point is 0, and the critical value is 0. We shall also assume that the diagonal entries in D 
are arranged in descending order This method also produces ill-conditioned matrices D. In an 



implementation of Algorithm 3.3 invoking Proposition 2.1 in step 3 to approximate the param- 
eters of the quadratic approximation can be numerically difficult. We ignore such difficulties for 
the time being and study the effects of the different strategies discussed in this paper instead. 

To start off our experiments, we generate the diagonal entries of D randomly from the uniform 
distribution on [0, 1] using the "rand" function in Matlab. The last eigenvalue will be chosen to be 
negative, and the rest will be positive. The critical level of / is 0, and we choose two points io and 
yo such that /(io) = .f{yo) = ^ 1- The points xq and yo are chosen as follows. First, we choose 
the first « — 1 coordinates of xq and yo randomly from the normal distribution using the "randn" 
function in Matlab. Next, we choose the last coordinate of xq and yo so that /(io) — f{yo) = — 1- 



(except for (3D), which 
4]and|5] 



We first observe the effect of different strategies in the Tables|2]and[3 ( 
is provably superior to the others). The results are summarized in Tables • 
The following observations can be made for the different strategies: 

(1) A greedy method may not be better than a pure strategy in the long term. 

(2) The iterates produced by (MV) is the best for objectives (1) and (2) in the short and 
medium term. 

(3) The iterates produced by (MG) has the best convergence of j {xj +yj) to the critical point 
(objective (4)) and of reducing the norm of the gradient (objective (3)). Note that as the 
algorithm progresses, ||-?/ — jjH will get smaller, so V/(ij) and V/(yj) will be far too 
similar after some point. While (3D) can provably do a much better job if such numerical 
errors are not encountered, the above observation suggests that (MG) is a good strategy 
once close to the critical point. 

(4) The iterates produced by (MD) has the best decrease for objectives (1), (3) and (4) in the 
first iteration, and this decrease is sustained for the next few iterations for (3) and (4). 
For objectives (1) and (2), once (MD) switches to (MV) to overcome ill-conditioning, 
the performance of the iterates catches up quickly to do just as well as the pure strategy 
(MV). 

We now study the performance of all pure strategies, including (3D). The performance is 
shown in Figure |4] Here are some observations from Figured 

(1) The strategy (3D) is the best among all choices in Tablel2] as expected. 

(2) The strategy (MG) performs the poorest in the long run for objectives (1) and (2), as is 
consistent with the data in Tables |4]and|5] 

(3) The strategy (MG) performs better than (MV), (PM) for objectives (3) and (4), as is 
consistent with the data in Tables |4]and|5] 

(4) The strategy (MD) is the best in the first few iterations for objectives (3) and (4). This 
behavior persists even as (MD) switches to (MG) when ill conditioning is encountered. 
For objectives (1) and (2), (MD) becomes competitive with the other strategies after it 
switches to (MV). This switching explains the sharp decline in the graphs in objectives 
(l)and(2)inFigure|4] 
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Percentage of times Objective (1) is optimal for different strategies 


# 


(H) + Strategy 




Pure strategies 


Pure and greedy strategies 




(MV) 


(PM) 


(MD)* 


(MG) 


(MV) 


(PM) 


(MD)* 


(MG) 


(Gl) 


(G2) 


(G3) 


(G4) 


1 


1 


2 


75 


22 


1 


2 


75 


22 














2 


51 


29 





13 


21 





14 


13 


39 











3 


34 


23 





43 





3 








63 


34 








4 


70 


23 





7 


1 


6 








36 


56 


1 





5 


78 


21 





1 


33 


14 








24 


28 


1 





6 


83 


17 








12 


10 








43 


35 








7 


74 


22 


4 





3 


10 








47 


40 
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Percentage of times Objective (2) is optimal for different strategies 


# 


(H) + Strategy 




Pure strategies 


Pure and greedy strategies 




(MV) 


(PM) 


(MD)* 


(MG) 


(MV) 
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(MD)* 


(MG) 
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(G2) 
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Table 4. In a run of 100 experiments for n = 100 for the first 15 iterations, the 
percentage of times for which the corresponding strategy is optimal is recorded. 
We compare among pure strategies in the first four columns, and compare the 
pure strategies together with the greedy strategies in the next eight columns. In 
strategy (MD), once ill-conditioning is encountered, we switch to (MV). 



9. Conclusion and open questions 

We presented an algorithm for finding a saddle point of mountain pass type using quadratic 
models on affine spaces. Our algorithm is similar in some ways to the conjugate gradient al- 
gorithm. The choices one has to make in implementing the algorithm are explained, and some 
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Percentage of times Objective (3) is optimal for different strategies 
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Percentage of times Objective (4) is optimal for different strategies 
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Table 5. Continuation of 
(MD), once ill-conditioning 



the same experiment from Table |4j In strategy 
is encountered, we now switch to (MG) instead. 



theoretical and numerical results are presented. We also explain briefly how our ideas can be 
implemented in a path-based mountain pass algorithm. 

Much still needs to be done. For example, formulas similar to that of Theorem ItTT] describ- 
ing the convergence of the various implementations will be helpful in fine tuning the algorithm. 
Lastly, our algorithm is only "local" in the sense that it works well only when the iterates are 
close to the saddle point where the quadratic approximation becomes more accurate. An empha- 
sis should also be placed on designing a "global" algorithms. 



FINDING SADDLE POINTS OF MOUNTAIN PASS TYPE WITH QUADRATIC MODELS ON AFFINE SPACES 



19 



/I ,-n<T> 


s 




(TS^ rt 


S 




1 1 


<m ' 

4P / 






g- 










gg) < 








<;teai 




Q) 


> 2 Q O Q 


-a^ / 


to 




> S Q Q 


<#4 


CD 




GXi>m<] * 
" 1 1 1 1 1 


,0' <® > 


^ 




©<I>m<] * 




■3- 


E' <| 


^ 




■B 


-fi ^ ^ / 




P 




<M 


OJ 


o 


<] @> / 









<®i 




E 


-0 <3 -d / 


Si 


g 
S 
"S 




<S 


Si 






a>- 


^ 








■l 


-m OS^ -' 


«> 


'tj 




<^ 


CD 


^ 


« ^ /' 









5®* 








Tl- 








OJ 


D T 1" 7 T 


'"' 


^0 


"0 °o 1 












90ub;s!p joj uojupuoo AinHLUjido uj uojieiojA - (. jo Bon 




jujod aippBS am} 0; aouBjsip jo Boi 











'T^ 







i 1 


1 f / 






i i 


^ 






S f 


<\ <ffi r - 


CO 




S c 


4m ■ 


2 




- £ 








1 1 


<®^ 






> S Q O Q 


to 




> ^ Q Q 




s 


c 


CD<i>m<] * 




■* 




III 

cixi>m<(] * 

III 


^ 




H ' -CC ' 




fe 


^ <^ <5) / 


^ ^ 


ra 




91* 


Si _ 




m <] ^ ' 


c 


1 




(M 


i 


Q 


fh <iS} ^ 


co- 


I 




^Kja 






a ' 




n 




^ 






a -0i / 


=D 






<©^ 


CD 


s- 


0#3 ^" 








<a* 






.0=3 , ' 


■* 






,<$5<$* 


■3- 




om,^ 








x>i330' 






- pt-'^".-' 


^ 






P^'-O^^fH* 


OJ 




o'-a •- 







& 


' « 





-c 


J "0 °o '0 'c 


'c 


3 '"c 


3 °0 '0 'c 


(30ue;S!p leiuiido) - (sjuauodiiioo uaamiaq aoueiSjp) ^o Bon 






juajpejB JO lujou ^o Bon 





Figure 4. Performance of various strategies for a random example. For ob- 
jectives (1) and (2), the strategies (MD) and (3D) switch to (MV) once ill- 
conditioning in encountered. For objectives (3) and (4), the strategies (MD) 
and (3D) switch to (MG) instead if ill-conditioning is encountered. 
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